Semantic object classes in video: A high-definition ground truth database

نویسندگان

  • Gabriel J. Brostow
  • Julien Fauqueur
  • Roberto Cipolla
چکیده

0167-8655/$ see front matter 2008 Elsevier B.V. A doi:10.1016/j.patrec.2008.04.005 * Corresponding author. Address: Computer Visio bridge, United Kingdom. Fax: +44 1223 332662. E-mail address: [email protected] (G.J. Brost URL: http://mi.eng.cam.ac.uk/research/projects/Vid Visual object analysis researchers are increasingly experimenting with video, because it is expected that motion cues should help with detection, recognition, and other analysis tasks. This paper presents the Cambridge-driving Labeled Video Database (CamVid) as the first collection of videos with object class semantic labels, complete with metadata. The database provides ground truth labels that associate each pixel with one of 32 semantic classes. The database addresses the need for experimental data to quantitatively evaluate emerging algorithms. While most videos are filmed with fixed-position CCTV-style cameras, our data was captured from the perspective of a driving automobile. The driving scenario increases the number and heterogeneity of the observed object classes. Over 10 min of high quality 30 Hz footage is being provided, with corresponding semantically labeled images at 1 Hz and in part, 15 Hz. The CamVid Database offers four contributions that are relevant to object analysis researchers. First, the per-pixel semantic segmentation of over 700 images was specified manually, and was then inspected and confirmed by a second person for accuracy. Second, the high-quality and large resolution color video images in the database represent valuable extended duration digitized footage to those interested in driving scenarios or ego-motion. Third, we filmed calibration sequences for the camera color response and intrinsics, and computed a 3D camera pose for each frame in the sequences. Finally, in support of expanding this or other databases, we present custom-made labeling software for assisting users who wish to paint precise class-labels for other images and videos. We evaluate the relevance of the database by measuring the performance of an algorithm from each of three distinct domains: multi-class object recognition, pedestrian detection, and label propagation. 2008 Elsevier B.V. All rights reserved.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Pixel-wise ground truth annotation in videos

In the last decades, a large diversity of automatic, semi-automatic and manual approaches for video segmentation and knowledge extraction from video-data has been proposed. Due to the high complexity in both the spatial and temporal domain, it continues to be a challenging research area. In order to develop, train, and evaluate new algorithms, ground truth of video-data is crucial. Pixel-wise a...

متن کامل

A data set for evaluating the performance of multi-class multi-object video tracking

One of the challenges in evaluating multi-object video detection, tracking and classification systems is having publically available data sets with which to compare different systems. However, the measures of performance for tracking and classification are different. Data sets that are suitable for evaluating tracking systems may not be appropriate for classification. Tracking video data sets t...

متن کامل

PASCAL Boundaries: A Class-Agnostic Semantic Boundary Dataset

In this paper, we address the boundary detection task motivated by the ambiguities in current definition of edge detection. To this end, we generate a large database consisting of more than 10k images (which is 20× bigger than existing edge detection databases) along with ground truth boundaries between 459 semantic classes including both foreground objects and different types of background, an...

متن کامل

Segmentation and Recognition Using Structure from Motion Point Clouds

We propose an algorithm for semantic segmentation based on 3D point clouds derived from ego-motion. We motivate five simple cues designed to model specific patterns of motion and 3D world structure that vary with object category. We introduce features that project the 3D cues back to the 2D image plane while modeling spatial layout and context. A randomized decision forest combines many such fe...

متن کامل

Object-Based Classification of UltraCamD Imagery for Identification of Tree Species in the Mixed Planted Forest

This study is a contribution to assess the high resolution digital aerial imagery for semi-automatic analysis of tree species identification. To maximize the benefit of such data, the object-based classification was conducted in a mixed forest plantation. Two subsets of an UltraCam D image were geometrically corrected using aero-triangulation method. Some appropriate transformations were perfor...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Pattern Recognition Letters

دوره 30  شماره 

صفحات  -

تاریخ انتشار 2009